Overview

Brought to you by YData

Dataset statistics

Number of variables22
Number of observations3116945
Missing cells15868508
Missing cells (%)23.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.7 GiB
Average record size in memory943.6 B

Variable types

Numeric4
Categorical10
Text8

Alerts

cap-diameter is highly overall correlated with stem-height and 1 other fieldsHigh correlation
class is highly overall correlated with stem-rootHigh correlation
stem-height is highly overall correlated with cap-diameterHigh correlation
stem-root is highly overall correlated with classHigh correlation
stem-width is highly overall correlated with cap-diameterHigh correlation
does-bruise-or-bleed is highly imbalanced (85.7%)Imbalance
gill-spacing is highly imbalanced (80.6%)Imbalance
stem-root is highly imbalanced (66.8%)Imbalance
veil-type is highly imbalanced (99.8%)Imbalance
veil-color is highly imbalanced (69.8%)Imbalance
has-ring is highly imbalanced (82.4%)Imbalance
ring-type is highly imbalanced (79.3%)Imbalance
spore-print-color is highly imbalanced (56.6%)Imbalance
cap-surface has 671023 (21.5%) missing valuesMissing
gill-attachment has 523936 (16.8%) missing valuesMissing
gill-spacing has 1258435 (40.4%) missing valuesMissing
stem-root has 2757023 (88.5%) missing valuesMissing
stem-surface has 1980861 (63.6%) missing valuesMissing
veil-type has 2957493 (94.9%) missing valuesMissing
veil-color has 2740947 (87.9%) missing valuesMissing
ring-type has 128880 (4.1%) missing valuesMissing
spore-print-color has 2849682 (91.4%) missing valuesMissing
id is uniformly distributedUniform
id has unique valuesUnique

Reproduction

Analysis started2025-12-11 09:36:47.451848
Analysis finished2025-12-11 09:38:47.308004
Duration1 minute and 59.86 seconds
Software versionydata-profiling vv4.17.0
Download configurationconfig.json

Variables

id
Real number (ℝ)

Uniform  Unique 

Distinct3116945
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1558472
Minimum0
Maximum3116944
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2025-12-11T10:38:47.796439image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile155847.2
Q1779236
median1558472
Q32337708
95-th percentile2961096.8
Maximum3116944
Range3116944
Interquartile range (IQR)1558472

Descriptive statistics

Standard deviation899784.66
Coefficient of variation (CV)0.57735055
Kurtosis-1.2
Mean1558472
Median Absolute Deviation (MAD)779236
Skewness-2.5075827 × 10-15
Sum4.8576715 × 1012
Variance8.0961244 × 1011
MonotonicityStrictly increasing
2025-12-11T10:38:47.890119image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
20779671
 
< 0.1%
20779581
 
< 0.1%
20779591
 
< 0.1%
20779601
 
< 0.1%
20779611
 
< 0.1%
20779621
 
< 0.1%
20779631
 
< 0.1%
20779641
 
< 0.1%
20779651
 
< 0.1%
Other values (3116935)3116935
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
31169441
< 0.1%
31169431
< 0.1%
31169421
< 0.1%
31169411
< 0.1%
31169401
< 0.1%
31169391
< 0.1%
31169381
< 0.1%
31169371
< 0.1%
31169361
< 0.1%
31169351
< 0.1%

class
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size172.4 MiB
p
1705396 
e
1411549 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3116945
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowe
2nd rowp
3rd rowe
4th rowe
5th rowe

Common Values

ValueCountFrequency (%)
p1705396
54.7%
e1411549
45.3%

Length

2025-12-11T10:38:47.978034image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-12-11T10:38:48.045387image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
p1705396
54.7%
e1411549
45.3%

Most occurring characters

ValueCountFrequency (%)
p1705396
54.7%
e1411549
45.3%

Most occurring categories

ValueCountFrequency (%)
(unknown)3116945
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
p1705396
54.7%
e1411549
45.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3116945
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
p1705396
54.7%
e1411549
45.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3116945
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
p1705396
54.7%
e1411549
45.3%

cap-diameter
Real number (ℝ)

High correlation 

Distinct3913
Distinct (%)0.1%
Missing4
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean6.3098484
Minimum0.03
Maximum80.67
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2025-12-11T10:38:48.124460image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0.03
5-th percentile1.34
Q13.32
median5.75
Q38.24
95-th percentile13.23
Maximum80.67
Range80.64
Interquartile range (IQR)4.92

Descriptive statistics

Standard deviation4.6579305
Coefficient of variation (CV)0.73820007
Kurtosis32.743381
Mean6.3098484
Median Absolute Deviation (MAD)2.46
Skewness3.9726092
Sum19667425
Variance21.696317
MonotonicityNot monotonic
2025-12-11T10:38:48.223346image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.498164
 
0.3%
3.187942
 
0.3%
3.147361
 
0.2%
1.517072
 
0.2%
4.046828
 
0.2%
3.286826
 
0.2%
2.876807
 
0.2%
3.856642
 
0.2%
3.246634
 
0.2%
1.526562
 
0.2%
Other values (3903)3046103
97.7%
ValueCountFrequency (%)
0.031
 
< 0.1%
0.11
 
< 0.1%
0.31
 
< 0.1%
0.381
 
< 0.1%
0.46
 
< 0.1%
0.412
 
< 0.1%
0.423
 
< 0.1%
0.4415
< 0.1%
0.452
 
< 0.1%
0.465
 
< 0.1%
ValueCountFrequency (%)
80.671
< 0.1%
64.461
< 0.1%
62.41
< 0.1%
62.31
< 0.1%
62.061
< 0.1%
62.011
< 0.1%
60.971
< 0.1%
59.761
< 0.1%
59.742
< 0.1%
59.661
< 0.1%
Distinct74
Distinct (%)< 0.1%
Missing40
Missing (%)< 0.1%
Memory size172.4 MiB
2025-12-11T10:38:48.342761image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length9
Median length1
Mean length1.0000536
Min length1

Characters and Unicode

Total characters3117072
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique47 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowx
3rd rowf
4th rowf
5th rowx
ValueCountFrequency (%)
x1436030
46.1%
f676240
21.7%
s365147
 
11.7%
b318647
 
10.2%
o108835
 
3.5%
p106968
 
3.4%
c104520
 
3.4%
d65
 
< 0.1%
e60
 
< 0.1%
n41
 
< 0.1%
Other values (62)360
 
< 0.1%
2025-12-11T10:38:48.536465image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
x1436030
46.1%
f676240
21.7%
s365149
 
11.7%
b318647
 
10.2%
o108835
 
3.5%
p106969
 
3.4%
c104520
 
3.4%
d65
 
< 0.1%
e61
 
< 0.1%
.44
 
< 0.1%
Other values (26)512
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)3117072
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
x1436030
46.1%
f676240
21.7%
s365149
 
11.7%
b318647
 
10.2%
o108835
 
3.5%
p106969
 
3.4%
c104520
 
3.4%
d65
 
< 0.1%
e61
 
< 0.1%
.44
 
< 0.1%
Other values (26)512
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3117072
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
x1436030
46.1%
f676240
21.7%
s365149
 
11.7%
b318647
 
10.2%
o108835
 
3.5%
p106969
 
3.4%
c104520
 
3.4%
d65
 
< 0.1%
e61
 
< 0.1%
.44
 
< 0.1%
Other values (26)512
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3117072
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
x1436030
46.1%
f676240
21.7%
s365149
 
11.7%
b318647
 
10.2%
o108835
 
3.5%
p106969
 
3.4%
c104520
 
3.4%
d65
 
< 0.1%
e61
 
< 0.1%
.44
 
< 0.1%
Other values (26)512
 
< 0.1%

cap-surface
Text

Missing 

Distinct83
Distinct (%)< 0.1%
Missing671023
Missing (%)21.5%
Memory size155.8 MiB
2025-12-11T10:38:48.616757image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001402
Min length1

Characters and Unicode

Total characters2446265
Distinct characters38
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)< 0.1%

Sample

1st rows
2nd rowh
3rd rows
4th rowy
5th rowl
ValueCountFrequency (%)
t460779
18.8%
s384970
15.7%
y327827
13.4%
h284463
11.6%
g263729
10.8%
d206832
8.5%
k128876
 
5.3%
e119712
 
4.9%
i113440
 
4.6%
w109840
 
4.5%
Other values (68)45465
 
1.9%
2025-12-11T10:38:48.782234image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t460785
18.8%
s385005
15.7%
y327831
13.4%
h284466
11.6%
g263735
10.8%
d206841
8.5%
k128876
 
5.3%
e119741
 
4.9%
i113454
 
4.6%
w109840
 
4.5%
Other values (28)45691
 
1.9%

Most occurring categories

ValueCountFrequency (%)
(unknown)2446265
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t460785
18.8%
s385005
15.7%
y327831
13.4%
h284466
11.6%
g263735
10.8%
d206841
8.5%
k128876
 
5.3%
e119741
 
4.9%
i113454
 
4.6%
w109840
 
4.5%
Other values (28)45691
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2446265
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t460785
18.8%
s385005
15.7%
y327831
13.4%
h284466
11.6%
g263735
10.8%
d206841
8.5%
k128876
 
5.3%
e119741
 
4.9%
i113454
 
4.6%
w109840
 
4.5%
Other values (28)45691
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2446265
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t460785
18.8%
s385005
15.7%
y327831
13.4%
h284466
11.6%
g263735
10.8%
d206841
8.5%
k128876
 
5.3%
e119741
 
4.9%
i113454
 
4.6%
w109840
 
4.5%
Other values (28)45691
 
1.9%
Distinct78
Distinct (%)< 0.1%
Missing12
Missing (%)< 0.1%
Memory size172.4 MiB
2025-12-11T10:38:48.851540image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001011
Min length1

Characters and Unicode

Total characters3117248
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique49 ?
Unique (%)< 0.1%

Sample

1st rowu
2nd rowo
3rd rowb
4th rowg
5th roww
ValueCountFrequency (%)
n1359544
43.6%
y386627
 
12.4%
w379442
 
12.2%
g210825
 
6.8%
e197290
 
6.3%
o178847
 
5.7%
p91838
 
2.9%
r78236
 
2.5%
u73172
 
2.3%
b61313
 
2.0%
Other values (68)99801
 
3.2%
2025-12-11T10:38:49.012712image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n1359556
43.6%
y386633
 
12.4%
w379442
 
12.2%
g210831
 
6.8%
e197314
 
6.3%
o178860
 
5.7%
p91844
 
2.9%
r78248
 
2.5%
u73175
 
2.3%
b61317
 
2.0%
Other values (27)100028
 
3.2%

Most occurring categories

ValueCountFrequency (%)
(unknown)3117248
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n1359556
43.6%
y386633
 
12.4%
w379442
 
12.2%
g210831
 
6.8%
e197314
 
6.3%
o178860
 
5.7%
p91844
 
2.9%
r78248
 
2.5%
u73175
 
2.3%
b61317
 
2.0%
Other values (27)100028
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3117248
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n1359556
43.6%
y386633
 
12.4%
w379442
 
12.2%
g210831
 
6.8%
e197314
 
6.3%
o178860
 
5.7%
p91844
 
2.9%
r78248
 
2.5%
u73175
 
2.3%
b61317
 
2.0%
Other values (27)100028
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3117248
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n1359556
43.6%
y386633
 
12.4%
w379442
 
12.2%
g210831
 
6.8%
e197314
 
6.3%
o178860
 
5.7%
p91844
 
2.9%
r78248
 
2.5%
u73175
 
2.3%
b61317
 
2.0%
Other values (27)100028
 
3.2%

does-bruise-or-bleed
Categorical

Imbalance 

Distinct26
Distinct (%)< 0.1%
Missing8
Missing (%)< 0.1%
Memory size172.4 MiB
f
2569743 
t
547085 
w
 
14
c
 
11
h
 
9
Other values (21)
 
75

Length

Max length8
Median length1
Mean length1.0000048
Min length1

Characters and Unicode

Total characters3116952
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowf
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f2569743
82.4%
t547085
 
17.6%
w14
 
< 0.1%
c11
 
< 0.1%
h9
 
< 0.1%
y7
 
< 0.1%
a7
 
< 0.1%
b7
 
< 0.1%
x7
 
< 0.1%
s6
 
< 0.1%
Other values (16)41
 
< 0.1%
(Missing)8
 
< 0.1%

Length

2025-12-11T10:38:49.099511image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f2569743
82.4%
t547085
 
17.6%
w14
 
< 0.1%
c11
 
< 0.1%
h9
 
< 0.1%
y7
 
< 0.1%
a7
 
< 0.1%
b7
 
< 0.1%
x7
 
< 0.1%
s6
 
< 0.1%
Other values (16)41
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f2569743
82.4%
t547085
 
17.6%
w14
 
< 0.1%
c11
 
< 0.1%
h10
 
< 0.1%
a8
 
< 0.1%
y7
 
< 0.1%
b7
 
< 0.1%
x7
 
< 0.1%
s7
 
< 0.1%
Other values (18)53
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)3116952
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f2569743
82.4%
t547085
 
17.6%
w14
 
< 0.1%
c11
 
< 0.1%
h10
 
< 0.1%
a8
 
< 0.1%
y7
 
< 0.1%
b7
 
< 0.1%
x7
 
< 0.1%
s7
 
< 0.1%
Other values (18)53
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3116952
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f2569743
82.4%
t547085
 
17.6%
w14
 
< 0.1%
c11
 
< 0.1%
h10
 
< 0.1%
a8
 
< 0.1%
y7
 
< 0.1%
b7
 
< 0.1%
x7
 
< 0.1%
s7
 
< 0.1%
Other values (18)53
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3116952
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f2569743
82.4%
t547085
 
17.6%
w14
 
< 0.1%
c11
 
< 0.1%
h10
 
< 0.1%
a8
 
< 0.1%
y7
 
< 0.1%
b7
 
< 0.1%
x7
 
< 0.1%
s7
 
< 0.1%
Other values (18)53
 
< 0.1%

gill-attachment
Text

Missing 

Distinct78
Distinct (%)< 0.1%
Missing523936
Missing (%)16.8%
Memory size159.4 MiB
2025-12-11T10:38:49.181720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0000891
Min length1

Characters and Unicode

Total characters2593240
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique53 ?
Unique (%)< 0.1%

Sample

1st rowa
2nd rowa
3rd rowx
4th rows
5th rowd
ValueCountFrequency (%)
a646035
24.9%
d589237
22.7%
x360878
13.9%
e301858
11.6%
s295439
11.4%
p279112
10.8%
f119956
 
4.6%
c74
 
< 0.1%
u56
 
< 0.1%
w37
 
< 0.1%
Other values (64)334
 
< 0.1%
2025-12-11T10:38:49.343701image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a646042
24.9%
d589242
22.7%
x360878
13.9%
e301872
11.6%
s295458
11.4%
p279113
10.8%
f119956
 
4.6%
c74
 
< 0.1%
u57
 
< 0.1%
.44
 
< 0.1%
Other values (27)504
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)2593240
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a646042
24.9%
d589242
22.7%
x360878
13.9%
e301872
11.6%
s295458
11.4%
p279113
10.8%
f119956
 
4.6%
c74
 
< 0.1%
u57
 
< 0.1%
.44
 
< 0.1%
Other values (27)504
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2593240
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a646042
24.9%
d589242
22.7%
x360878
13.9%
e301872
11.6%
s295458
11.4%
p279113
10.8%
f119956
 
4.6%
c74
 
< 0.1%
u57
 
< 0.1%
.44
 
< 0.1%
Other values (27)504
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2593240
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a646042
24.9%
d589242
22.7%
x360878
13.9%
e301872
11.6%
s295458
11.4%
p279113
10.8%
f119956
 
4.6%
c74
 
< 0.1%
u57
 
< 0.1%
.44
 
< 0.1%
Other values (27)504
 
< 0.1%

gill-spacing
Categorical

Imbalance  Missing 

Distinct48
Distinct (%)< 0.1%
Missing1258435
Missing (%)40.4%
Memory size170.0 MiB
c
1331054 
d
407932 
f
 
119380
e
 
24
a
 
17
Other values (43)
 
103

Length

Max length11
Median length1
Mean length1.00005
Min length1

Characters and Unicode

Total characters1858603
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)< 0.1%

Sample

1st rowc
2nd rowc
3rd rowc
4th rowc
5th rowc

Common Values

ValueCountFrequency (%)
c1331054
42.7%
d407932
 
13.1%
f119380
 
3.8%
e24
 
< 0.1%
a17
 
< 0.1%
s16
 
< 0.1%
b12
 
< 0.1%
x8
 
< 0.1%
t8
 
< 0.1%
p7
 
< 0.1%
Other values (38)52
 
< 0.1%
(Missing)1258435
40.4%

Length

2025-12-11T10:38:49.426615image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c1331054
71.6%
d407932
 
21.9%
f119381
 
6.4%
e24
 
< 0.1%
a17
 
< 0.1%
s16
 
< 0.1%
b12
 
< 0.1%
x8
 
< 0.1%
t8
 
< 0.1%
p7
 
< 0.1%
Other values (38)52
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
c1331057
71.6%
d407933
 
21.9%
f119382
 
6.4%
e26
 
< 0.1%
.25
 
< 0.1%
a20
 
< 0.1%
s20
 
< 0.1%
b12
 
< 0.1%
210
 
< 0.1%
310
 
< 0.1%
Other values (24)108
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)1858603
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
c1331057
71.6%
d407933
 
21.9%
f119382
 
6.4%
e26
 
< 0.1%
.25
 
< 0.1%
a20
 
< 0.1%
s20
 
< 0.1%
b12
 
< 0.1%
210
 
< 0.1%
310
 
< 0.1%
Other values (24)108
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)1858603
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
c1331057
71.6%
d407933
 
21.9%
f119382
 
6.4%
e26
 
< 0.1%
.25
 
< 0.1%
a20
 
< 0.1%
s20
 
< 0.1%
b12
 
< 0.1%
210
 
< 0.1%
310
 
< 0.1%
Other values (24)108
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)1858603
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
c1331057
71.6%
d407933
 
21.9%
f119382
 
6.4%
e26
 
< 0.1%
.25
 
< 0.1%
a20
 
< 0.1%
s20
 
< 0.1%
b12
 
< 0.1%
210
 
< 0.1%
310
 
< 0.1%
Other values (24)108
 
< 0.1%
Distinct63
Distinct (%)< 0.1%
Missing57
Missing (%)< 0.1%
Memory size172.4 MiB
2025-12-11T10:38:49.491784image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001078
Min length1

Characters and Unicode

Total characters3117224
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)< 0.1%

Sample

1st roww
2nd rown
3rd roww
4th rowg
5th roww
ValueCountFrequency (%)
w931539
29.9%
n543387
17.4%
y469466
15.1%
p343626
 
11.0%
g212164
 
6.8%
o157119
 
5.0%
k127970
 
4.1%
f119694
 
3.8%
r62799
 
2.0%
e56048
 
1.8%
Other values (51)93080
 
3.0%
2025-12-11T10:38:49.658041image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
w931539
29.9%
n543409
17.4%
y469472
15.1%
p343642
 
11.0%
g212176
 
6.8%
o157141
 
5.0%
k127970
 
4.1%
f119694
 
3.8%
r62819
 
2.0%
e56072
 
1.8%
Other values (27)93290
 
3.0%

Most occurring categories

ValueCountFrequency (%)
(unknown)3117224
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w931539
29.9%
n543409
17.4%
y469472
15.1%
p343642
 
11.0%
g212176
 
6.8%
o157141
 
5.0%
k127970
 
4.1%
f119694
 
3.8%
r62819
 
2.0%
e56072
 
1.8%
Other values (27)93290
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3117224
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w931539
29.9%
n543409
17.4%
y469472
15.1%
p343642
 
11.0%
g212176
 
6.8%
o157141
 
5.0%
k127970
 
4.1%
f119694
 
3.8%
r62819
 
2.0%
e56072
 
1.8%
Other values (27)93290
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3117224
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w931539
29.9%
n543409
17.4%
y469472
15.1%
p343642
 
11.0%
g212176
 
6.8%
o157141
 
5.0%
k127970
 
4.1%
f119694
 
3.8%
r62819
 
2.0%
e56072
 
1.8%
Other values (27)93290
 
3.0%

stem-height
Real number (ℝ)

High correlation 

Distinct2749
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.3483333
Minimum0
Maximum88.72
Zeros554
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2025-12-11T10:38:49.744040image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.16
Q14.67
median5.88
Q37.41
95-th percentile11.2
Maximum88.72
Range88.72
Interquartile range (IQR)2.74

Descriptive statistics

Standard deviation2.6997548
Coefficient of variation (CV)0.42526985
Kurtosis7.7615498
Mean6.3483333
Median Absolute Deviation (MAD)1.33
Skewness1.9266817
Sum19787406
Variance7.288676
MonotonicityNot monotonic
2025-12-11T10:38:49.830813image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5.2412332
 
0.4%
5.9211821
 
0.4%
5.3210988
 
0.4%
5.3510431
 
0.3%
5.9910402
 
0.3%
6.0310271
 
0.3%
5.5410265
 
0.3%
5.7710153
 
0.3%
4.2710080
 
0.3%
5.659994
 
0.3%
Other values (2739)3010208
96.6%
ValueCountFrequency (%)
0554
< 0.1%
0.741
 
< 0.1%
0.771
 
< 0.1%
0.911
 
< 0.1%
0.931
 
< 0.1%
0.972
 
< 0.1%
0.981
 
< 0.1%
11
 
< 0.1%
1.011
 
< 0.1%
1.031
 
< 0.1%
ValueCountFrequency (%)
88.721
< 0.1%
57.221
< 0.1%
53.931
< 0.1%
53.871
< 0.1%
53.821
< 0.1%
53.031
< 0.1%
51.411
< 0.1%
50.781
< 0.1%
50.271
< 0.1%
49.371
< 0.1%

stem-width
Real number (ℝ)

High correlation 

Distinct5836
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.153785
Minimum0
Maximum102.9
Zeros497
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.8 MiB
2025-12-11T10:38:49.922431image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.58
Q14.97
median9.65
Q315.63
95-th percentile26.49
Maximum102.9
Range102.9
Interquartile range (IQR)10.66

Descriptive statistics

Standard deviation8.0954773
Coefficient of variation (CV)0.72580538
Kurtosis2.4489761
Mean11.153785
Median Absolute Deviation (MAD)5.24
Skewness1.2354271
Sum34765735
Variance65.536753
MonotonicityNot monotonic
2025-12-11T10:38:50.016439image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.417829
 
0.3%
2.457353
 
0.2%
2.497087
 
0.2%
2.566824
 
0.2%
2.476709
 
0.2%
2.526660
 
0.2%
2.516535
 
0.2%
2.646467
 
0.2%
2.66366
 
0.2%
2.616117
 
0.2%
Other values (5826)3048998
97.8%
ValueCountFrequency (%)
0497
< 0.1%
0.442
 
< 0.1%
0.482
 
< 0.1%
0.491
 
< 0.1%
0.53
 
< 0.1%
0.511
 
< 0.1%
0.5221
 
< 0.1%
0.5316
 
< 0.1%
0.5411
 
< 0.1%
0.5512
 
< 0.1%
ValueCountFrequency (%)
102.91
 
< 0.1%
102.486
< 0.1%
101.693
< 0.1%
982
 
< 0.1%
94.243
< 0.1%
94.051
 
< 0.1%
92.511
 
< 0.1%
91.911
 
< 0.1%
911
 
< 0.1%
89.452
 
< 0.1%

stem-root
Categorical

High correlation  Imbalance  Missing 

Distinct38
Distinct (%)< 0.1%
Missing2757023
Missing (%)88.5%
Memory size167.1 MiB
b
165801 
s
116946 
r
47803 
c
28592 
f
 
597
Other values (33)
 
183

Length

Max length17
Median length1
Mean length1.0001778
Min length1

Characters and Unicode

Total characters359986
Distinct characters35
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)< 0.1%

Sample

1st rowb
2nd rowb
3rd rowc
4th rowb
5th rowr

Common Values

ValueCountFrequency (%)
b165801
 
5.3%
s116946
 
3.8%
r47803
 
1.5%
c28592
 
0.9%
f597
 
< 0.1%
d24
 
< 0.1%
y14
 
< 0.1%
w12
 
< 0.1%
p12
 
< 0.1%
g12
 
< 0.1%
Other values (28)109
 
< 0.1%
(Missing)2757023
88.5%

Length

2025-12-11T10:38:50.110804image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b165801
46.1%
s116946
32.5%
r47803
 
13.3%
c28592
 
7.9%
f597
 
0.2%
d24
 
< 0.1%
y14
 
< 0.1%
p12
 
< 0.1%
g12
 
< 0.1%
w12
 
< 0.1%
Other values (28)109
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
b165801
46.1%
s116947
32.5%
r47806
 
13.3%
c28593
 
7.9%
f597
 
0.2%
d24
 
< 0.1%
p14
 
< 0.1%
.14
 
< 0.1%
y14
 
< 0.1%
w12
 
< 0.1%
Other values (25)164
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)359986
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
b165801
46.1%
s116947
32.5%
r47806
 
13.3%
c28593
 
7.9%
f597
 
0.2%
d24
 
< 0.1%
p14
 
< 0.1%
.14
 
< 0.1%
y14
 
< 0.1%
w12
 
< 0.1%
Other values (25)164
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)359986
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
b165801
46.1%
s116947
32.5%
r47806
 
13.3%
c28593
 
7.9%
f597
 
0.2%
d24
 
< 0.1%
p14
 
< 0.1%
.14
 
< 0.1%
y14
 
< 0.1%
w12
 
< 0.1%
Other values (25)164
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)359986
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
b165801
46.1%
s116947
32.5%
r47806
 
13.3%
c28593
 
7.9%
f597
 
0.2%
d24
 
< 0.1%
p14
 
< 0.1%
.14
 
< 0.1%
y14
 
< 0.1%
w12
 
< 0.1%
Other values (25)164
 
< 0.1%

stem-surface
Text

Missing 

Distinct60
Distinct (%)< 0.1%
Missing1980861
Missing (%)63.6%
Memory size123.3 MiB
2025-12-11T10:38:50.254827image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0001919
Min length1

Characters and Unicode

Total characters1136302
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)< 0.1%

Sample

1st rowy
2nd rows
3rd rows
4th rowt
5th rows
ValueCountFrequency (%)
s327611
28.8%
y255500
22.5%
i224346
19.7%
t147974
13.0%
g78080
 
6.9%
k73383
 
6.5%
h28284
 
2.5%
f512
 
< 0.1%
w49
 
< 0.1%
d48
 
< 0.1%
Other values (50)300
 
< 0.1%
2025-12-11T10:38:50.411906image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s327634
28.8%
y255500
22.5%
i224350
19.7%
t147975
13.0%
g78081
 
6.9%
k73383
 
6.5%
h28286
 
2.5%
f512
 
< 0.1%
d54
 
< 0.1%
e54
 
< 0.1%
Other values (27)473
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)1136302
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s327634
28.8%
y255500
22.5%
i224350
19.7%
t147975
13.0%
g78081
 
6.9%
k73383
 
6.5%
h28286
 
2.5%
f512
 
< 0.1%
d54
 
< 0.1%
e54
 
< 0.1%
Other values (27)473
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)1136302
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s327634
28.8%
y255500
22.5%
i224350
19.7%
t147975
13.0%
g78081
 
6.9%
k73383
 
6.5%
h28286
 
2.5%
f512
 
< 0.1%
d54
 
< 0.1%
e54
 
< 0.1%
Other values (27)473
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)1136302
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s327634
28.8%
y255500
22.5%
i224350
19.7%
t147975
13.0%
g78081
 
6.9%
k73383
 
6.5%
h28286
 
2.5%
f512
 
< 0.1%
d54
 
< 0.1%
e54
 
< 0.1%
Other values (27)473
 
< 0.1%
Distinct59
Distinct (%)< 0.1%
Missing38
Missing (%)< 0.1%
Memory size172.4 MiB
2025-12-11T10:38:50.476733image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length17
Median length1
Mean length1.0000539
Min length1

Characters and Unicode

Total characters3117075
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)< 0.1%

Sample

1st roww
2nd rowo
3rd rown
4th roww
5th roww
ValueCountFrequency (%)
w1196638
38.4%
n1003466
32.2%
y373971
 
12.0%
g132019
 
4.2%
o111541
 
3.6%
e103374
 
3.3%
u67017
 
2.2%
p54690
 
1.8%
k33676
 
1.1%
r22329
 
0.7%
Other values (47)18189
 
0.6%
2025-12-11T10:38:50.629541image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
w1196638
38.4%
n1003471
32.2%
y373974
 
12.0%
g132022
 
4.2%
o111547
 
3.6%
e103379
 
3.3%
u67017
 
2.1%
p54697
 
1.8%
k33676
 
1.1%
r22338
 
0.7%
Other values (26)18316
 
0.6%

Most occurring categories

ValueCountFrequency (%)
(unknown)3117075
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w1196638
38.4%
n1003471
32.2%
y373974
 
12.0%
g132022
 
4.2%
o111547
 
3.6%
e103379
 
3.3%
u67017
 
2.1%
p54697
 
1.8%
k33676
 
1.1%
r22338
 
0.7%
Other values (26)18316
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3117075
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w1196638
38.4%
n1003471
32.2%
y373974
 
12.0%
g132022
 
4.2%
o111547
 
3.6%
e103379
 
3.3%
u67017
 
2.1%
p54697
 
1.8%
k33676
 
1.1%
r22338
 
0.7%
Other values (26)18316
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3117075
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w1196638
38.4%
n1003471
32.2%
y373974
 
12.0%
g132022
 
4.2%
o111547
 
3.6%
e103379
 
3.3%
u67017
 
2.1%
p54697
 
1.8%
k33676
 
1.1%
r22338
 
0.7%
Other values (26)18316
 
0.6%

veil-type
Categorical

Imbalance  Missing 

Distinct22
Distinct (%)< 0.1%
Missing2957493
Missing (%)94.9%
Memory size166.8 MiB
u
159373 
w
 
11
a
 
9
e
 
8
f
 
8
Other values (17)
 
43

Length

Max length7
Median length1
Mean length1.0000815
Min length1

Characters and Unicode

Total characters159465
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowu
2nd rowu
3rd rowu
4th rowu
5th rowu

Common Values

ValueCountFrequency (%)
u159373
 
5.1%
w11
 
< 0.1%
a9
 
< 0.1%
e8
 
< 0.1%
f8
 
< 0.1%
b5
 
< 0.1%
c5
 
< 0.1%
g4
 
< 0.1%
y4
 
< 0.1%
k4
 
< 0.1%
Other values (12)21
 
< 0.1%
(Missing)2957493
94.9%

Length

2025-12-11T10:38:50.715232image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
u159373
99.9%
w11
 
< 0.1%
a9
 
< 0.1%
e8
 
< 0.1%
f8
 
< 0.1%
b5
 
< 0.1%
c5
 
< 0.1%
g4
 
< 0.1%
y4
 
< 0.1%
k4
 
< 0.1%
Other values (13)22
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
u159373
99.9%
w11
 
< 0.1%
a9
 
< 0.1%
e9
 
< 0.1%
f8
 
< 0.1%
b5
 
< 0.1%
c5
 
< 0.1%
g4
 
< 0.1%
y4
 
< 0.1%
k4
 
< 0.1%
Other values (18)33
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)159465
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
u159373
99.9%
w11
 
< 0.1%
a9
 
< 0.1%
e9
 
< 0.1%
f8
 
< 0.1%
b5
 
< 0.1%
c5
 
< 0.1%
g4
 
< 0.1%
y4
 
< 0.1%
k4
 
< 0.1%
Other values (18)33
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)159465
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
u159373
99.9%
w11
 
< 0.1%
a9
 
< 0.1%
e9
 
< 0.1%
f8
 
< 0.1%
b5
 
< 0.1%
c5
 
< 0.1%
g4
 
< 0.1%
y4
 
< 0.1%
k4
 
< 0.1%
Other values (18)33
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)159465
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
u159373
99.9%
w11
 
< 0.1%
a9
 
< 0.1%
e9
 
< 0.1%
f8
 
< 0.1%
b5
 
< 0.1%
c5
 
< 0.1%
g4
 
< 0.1%
y4
 
< 0.1%
k4
 
< 0.1%
Other values (18)33
 
< 0.1%

veil-color
Categorical

Imbalance  Missing 

Distinct24
Distinct (%)< 0.1%
Missing2740947
Missing (%)87.9%
Memory size167.2 MiB
w
279070 
y
30473 
n
30039 
u
 
14026
k
 
13080
Other values (19)
 
9310

Length

Max length4
Median length1
Mean length1.0000239
Min length1

Characters and Unicode

Total characters376007
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rown
2nd roww
3rd roww
4th roww
5th rown

Common Values

ValueCountFrequency (%)
w279070
 
9.0%
y30473
 
1.0%
n30039
 
1.0%
u14026
 
0.4%
k13080
 
0.4%
e9169
 
0.3%
g30
 
< 0.1%
p23
 
< 0.1%
r14
 
< 0.1%
o13
 
< 0.1%
Other values (14)61
 
< 0.1%
(Missing)2740947
87.9%

Length

2025-12-11T10:38:50.800506image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
w279070
74.2%
y30473
 
8.1%
n30039
 
8.0%
u14026
 
3.7%
k13080
 
3.5%
e9169
 
2.4%
g30
 
< 0.1%
p23
 
< 0.1%
r14
 
< 0.1%
o13
 
< 0.1%
Other values (14)61
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
w279070
74.2%
y30473
 
8.1%
n30039
 
8.0%
u14026
 
3.7%
k13080
 
3.5%
e9169
 
2.4%
g30
 
< 0.1%
p23
 
< 0.1%
r14
 
< 0.1%
o13
 
< 0.1%
Other values (18)70
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)376007
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w279070
74.2%
y30473
 
8.1%
n30039
 
8.0%
u14026
 
3.7%
k13080
 
3.5%
e9169
 
2.4%
g30
 
< 0.1%
p23
 
< 0.1%
r14
 
< 0.1%
o13
 
< 0.1%
Other values (18)70
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)376007
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w279070
74.2%
y30473
 
8.1%
n30039
 
8.0%
u14026
 
3.7%
k13080
 
3.5%
e9169
 
2.4%
g30
 
< 0.1%
p23
 
< 0.1%
r14
 
< 0.1%
o13
 
< 0.1%
Other values (18)70
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)376007
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w279070
74.2%
y30473
 
8.1%
n30039
 
8.0%
u14026
 
3.7%
k13080
 
3.5%
e9169
 
2.4%
g30
 
< 0.1%
p23
 
< 0.1%
r14
 
< 0.1%
o13
 
< 0.1%
Other values (18)70
 
< 0.1%

has-ring
Categorical

Imbalance 

Distinct23
Distinct (%)< 0.1%
Missing24
Missing (%)< 0.1%
Memory size172.4 MiB
f
2368820 
t
747982 
r
 
16
h
 
13
c
 
11
Other values (18)
 
79

Length

Max length10
Median length1
Mean length1.0000038
Min length1

Characters and Unicode

Total characters3116933
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowt
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f2368820
76.0%
t747982
 
24.0%
r16
 
< 0.1%
h13
 
< 0.1%
c11
 
< 0.1%
l11
 
< 0.1%
s11
 
< 0.1%
p11
 
< 0.1%
g8
 
< 0.1%
z6
 
< 0.1%
Other values (13)32
 
< 0.1%
(Missing)24
 
< 0.1%

Length

2025-12-11T10:38:50.879607image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f2368821
76.0%
t747982
 
24.0%
r16
 
< 0.1%
h13
 
< 0.1%
c11
 
< 0.1%
l11
 
< 0.1%
s11
 
< 0.1%
p11
 
< 0.1%
g8
 
< 0.1%
z6
 
< 0.1%
Other values (13)32
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f2368821
76.0%
t747982
 
24.0%
r17
 
< 0.1%
h14
 
< 0.1%
s12
 
< 0.1%
c11
 
< 0.1%
l11
 
< 0.1%
p11
 
< 0.1%
g9
 
< 0.1%
z6
 
< 0.1%
Other values (17)39
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)3116933
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f2368821
76.0%
t747982
 
24.0%
r17
 
< 0.1%
h14
 
< 0.1%
s12
 
< 0.1%
c11
 
< 0.1%
l11
 
< 0.1%
p11
 
< 0.1%
g9
 
< 0.1%
z6
 
< 0.1%
Other values (17)39
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3116933
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f2368821
76.0%
t747982
 
24.0%
r17
 
< 0.1%
h14
 
< 0.1%
s12
 
< 0.1%
c11
 
< 0.1%
l11
 
< 0.1%
p11
 
< 0.1%
g9
 
< 0.1%
z6
 
< 0.1%
Other values (17)39
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3116933
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f2368821
76.0%
t747982
 
24.0%
r17
 
< 0.1%
h14
 
< 0.1%
s12
 
< 0.1%
c11
 
< 0.1%
l11
 
< 0.1%
p11
 
< 0.1%
g9
 
< 0.1%
z6
 
< 0.1%
Other values (17)39
 
< 0.1%

ring-type
Categorical

Imbalance  Missing 

Distinct40
Distinct (%)< 0.1%
Missing128880
Missing (%)4.1%
Memory size172.2 MiB
f
2477170 
e
 
120006
z
 
113780
l
 
73443
r
 
67909
Other values (35)
 
135757

Length

Max length20
Median length1
Mean length1.0000472
Min length1

Characters and Unicode

Total characters2988206
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st rowf
2nd rowz
3rd rowf
4th rowf
5th rowf

Common Values

ValueCountFrequency (%)
f2477170
79.5%
e120006
 
3.9%
z113780
 
3.7%
l73443
 
2.4%
r67909
 
2.2%
p67678
 
2.2%
g63687
 
2.0%
m3992
 
0.1%
t98
 
< 0.1%
d37
 
< 0.1%
Other values (30)265
 
< 0.1%
(Missing)128880
 
4.1%

Length

2025-12-11T10:38:50.967175image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f2477173
82.9%
e120006
 
4.0%
z113780
 
3.8%
l73443
 
2.5%
r67909
 
2.3%
p67678
 
2.3%
g63687
 
2.1%
m3992
 
0.1%
t98
 
< 0.1%
d37
 
< 0.1%
Other values (30)265
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
f2477173
82.9%
e120024
 
4.0%
z113780
 
3.8%
l73446
 
2.5%
r67921
 
2.3%
p67688
 
2.3%
g63694
 
2.1%
m3992
 
0.1%
t106
 
< 0.1%
n45
 
< 0.1%
Other values (24)337
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)2988206
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
f2477173
82.9%
e120024
 
4.0%
z113780
 
3.8%
l73446
 
2.5%
r67921
 
2.3%
p67688
 
2.3%
g63694
 
2.1%
m3992
 
0.1%
t106
 
< 0.1%
n45
 
< 0.1%
Other values (24)337
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)2988206
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
f2477173
82.9%
e120024
 
4.0%
z113780
 
3.8%
l73446
 
2.5%
r67921
 
2.3%
p67688
 
2.3%
g63694
 
2.1%
m3992
 
0.1%
t106
 
< 0.1%
n45
 
< 0.1%
Other values (24)337
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)2988206
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
f2477173
82.9%
e120024
 
4.0%
z113780
 
3.8%
l73446
 
2.5%
r67921
 
2.3%
p67688
 
2.3%
g63694
 
2.1%
m3992
 
0.1%
t106
 
< 0.1%
n45
 
< 0.1%
Other values (24)337
 
< 0.1%

spore-print-color
Categorical

Imbalance  Missing 

Distinct32
Distinct (%)< 0.1%
Missing2849682
Missing (%)91.4%
Memory size167.0 MiB
k
107310 
p
68237 
w
50173 
n
22646 
r
 
7975
Other values (27)
10922 

Length

Max length10
Median length1
Mean length1.0001983
Min length1

Characters and Unicode

Total characters267316
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowk
2nd roww
3rd rowk
4th rowk
5th rowp

Common Values

ValueCountFrequency (%)
k107310
 
3.4%
p68237
 
2.2%
w50173
 
1.6%
n22646
 
0.7%
r7975
 
0.3%
u7256
 
0.2%
g3492
 
0.1%
y36
 
< 0.1%
s21
 
< 0.1%
c16
 
< 0.1%
Other values (22)101
 
< 0.1%
(Missing)2849682
91.4%

Length

2025-12-11T10:38:51.051367image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
k107310
40.2%
p68237
25.5%
w50173
18.8%
n22646
 
8.5%
r7975
 
3.0%
u7256
 
2.7%
g3492
 
1.3%
y36
 
< 0.1%
s21
 
< 0.1%
c16
 
< 0.1%
Other values (23)103
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
k107310
40.1%
p68237
25.5%
w50173
18.8%
n22649
 
8.5%
r7977
 
3.0%
u7256
 
2.7%
g3492
 
1.3%
y36
 
< 0.1%
s25
 
< 0.1%
c19
 
< 0.1%
Other values (26)142
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)267316
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
k107310
40.1%
p68237
25.5%
w50173
18.8%
n22649
 
8.5%
r7977
 
3.0%
u7256
 
2.7%
g3492
 
1.3%
y36
 
< 0.1%
s25
 
< 0.1%
c19
 
< 0.1%
Other values (26)142
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)267316
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
k107310
40.1%
p68237
25.5%
w50173
18.8%
n22649
 
8.5%
r7977
 
3.0%
u7256
 
2.7%
g3492
 
1.3%
y36
 
< 0.1%
s25
 
< 0.1%
c19
 
< 0.1%
Other values (26)142
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)267316
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
k107310
40.1%
p68237
25.5%
w50173
18.8%
n22649
 
8.5%
r7977
 
3.0%
u7256
 
2.7%
g3492
 
1.3%
y36
 
< 0.1%
s25
 
< 0.1%
c19
 
< 0.1%
Other values (26)142
 
0.1%

habitat
Text

Distinct52
Distinct (%)< 0.1%
Missing45
Missing (%)< 0.1%
Memory size172.4 MiB
2025-12-11T10:38:51.109821image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length20
Median length1
Mean length1.0000677
Min length1

Characters and Unicode

Total characters3117111
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)< 0.1%

Sample

1st rowd
2nd rowd
3rd rowl
4th rowd
5th rowg
ValueCountFrequency (%)
d2177573
69.9%
g454908
 
14.6%
l171892
 
5.5%
m150969
 
4.8%
h120138
 
3.9%
w18531
 
0.6%
p17180
 
0.6%
u5264
 
0.2%
e55
 
< 0.1%
s52
 
< 0.1%
Other values (41)340
 
< 0.1%
2025-12-11T10:38:51.265460image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
d2177576
69.9%
g454910
 
14.6%
l171900
 
5.5%
m150970
 
4.8%
h120143
 
3.9%
w18531
 
0.6%
p17190
 
0.6%
u5265
 
0.2%
e68
 
< 0.1%
s65
 
< 0.1%
Other values (27)493
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown)3117111
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
d2177576
69.9%
g454910
 
14.6%
l171900
 
5.5%
m150970
 
4.8%
h120143
 
3.9%
w18531
 
0.6%
p17190
 
0.6%
u5265
 
0.2%
e68
 
< 0.1%
s65
 
< 0.1%
Other values (27)493
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3117111
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
d2177576
69.9%
g454910
 
14.6%
l171900
 
5.5%
m150970
 
4.8%
h120143
 
3.9%
w18531
 
0.6%
p17190
 
0.6%
u5265
 
0.2%
e68
 
< 0.1%
s65
 
< 0.1%
Other values (27)493
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3117111
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
d2177576
69.9%
g454910
 
14.6%
l171900
 
5.5%
m150970
 
4.8%
h120143
 
3.9%
w18531
 
0.6%
p17190
 
0.6%
u5265
 
0.2%
e68
 
< 0.1%
s65
 
< 0.1%
Other values (27)493
 
< 0.1%

season
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size172.4 MiB
a
1543321 
u
1153588 
w
278189 
s
 
141847

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3116945
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd roww
3rd roww
4th rowu
5th rowa

Common Values

ValueCountFrequency (%)
a1543321
49.5%
u1153588
37.0%
w278189
 
8.9%
s141847
 
4.6%

Length

2025-12-11T10:38:51.344853image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-12-11T10:38:51.415802image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
a1543321
49.5%
u1153588
37.0%
w278189
 
8.9%
s141847
 
4.6%

Most occurring characters

ValueCountFrequency (%)
a1543321
49.5%
u1153588
37.0%
w278189
 
8.9%
s141847
 
4.6%

Most occurring categories

ValueCountFrequency (%)
(unknown)3116945
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a1543321
49.5%
u1153588
37.0%
w278189
 
8.9%
s141847
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown)3116945
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a1543321
49.5%
u1153588
37.0%
w278189
 
8.9%
s141847
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown)3116945
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a1543321
49.5%
u1153588
37.0%
w278189
 
8.9%
s141847
 
4.6%

Interactions

2025-12-11T10:38:19.246849image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:16.396243image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:17.455768image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:18.375338image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:19.484500image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:16.701727image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:17.682498image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:18.600126image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:19.706129image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:16.924838image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:17.922502image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:18.807967image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:19.917869image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:17.125437image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:18.153389image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-12-11T10:38:19.018610image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-12-11T10:38:51.476588image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
cap-diameterclassdoes-bruise-or-bleedgill-spacinghas-ringidring-typeseasonspore-print-colorstem-heightstem-rootstem-widthveil-colorveil-type
cap-diameter1.0000.1580.1100.0640.0500.0000.0940.0910.2750.5120.3370.8830.1130.000
class0.1581.0000.0380.1400.0500.0000.1970.1490.4260.0730.5210.2180.4960.003
does-bruise-or-bleed0.1100.0381.0000.0350.0090.0000.0430.0920.0850.0440.1090.1060.1780.000
gill-spacing0.0640.1400.0351.0000.0480.0010.0440.1550.4050.0300.1970.0880.1150.167
has-ring0.0500.0500.0090.0481.0000.0000.1940.0230.1770.1050.0850.0790.1420.000
id0.0000.0000.0000.0010.0001.0000.0000.0000.0000.0000.0000.0000.0000.002
ring-type0.0940.1970.0430.0440.1940.0001.0000.0700.2610.2270.1420.1210.1800.072
season0.0910.1490.0920.1550.0230.0000.0701.0000.2130.0510.1470.0730.1470.000
spore-print-color0.2750.4260.0850.4050.1770.0000.2610.2131.0000.0950.3440.3600.2710.152
stem-height0.5120.0730.0440.0300.1050.0000.2270.0510.0951.0000.2460.4490.1950.006
stem-root0.3370.5210.1090.1970.0850.0000.1420.1470.3440.2461.0000.2370.3270.080
stem-width0.8830.2180.1060.0880.0790.0000.1210.0730.3600.4490.2371.0000.2110.010
veil-color0.1130.4960.1780.1150.1420.0000.1800.1470.2710.1950.3270.2111.0000.390
veil-type0.0000.0030.0000.1670.0000.0020.0720.0000.1520.0060.0800.0100.3901.000

Missing values

2025-12-11T10:38:21.082699image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-12-11T10:38:26.146137image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-12-11T10:38:44.999744image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

idclasscap-diametercap-shapecap-surfacecap-colordoes-bruise-or-bleedgill-attachmentgill-spacinggill-colorstem-heightstem-widthstem-rootstem-surfacestem-colorveil-typeveil-colorhas-ringring-typespore-print-colorhabitatseason
00e8.80fsufacw4.5115.39NaNNaNwNaNNaNffNaNda
11p4.51xhofacn4.796.48NaNyoNaNNaNtzNaNdw
22e6.94fsbfxcw6.859.93NaNsnNaNNaNffNaNlw
33e3.88fygfsNaNg4.166.53NaNNaNwNaNNaNffNaNdu
44e5.85xlwfdNaNw3.378.36NaNNaNwNaNNaNffNaNga
55p4.30xtnfscn5.918.20NaNNaNwNaNntzNaNda
66e9.65pywfeck19.0712.69NaNswNaNNaNteNaNgw
77p4.55xeefaNaNy8.319.77NaNNaNyNaNwtzNaNda
88p7.36fhefxdw5.7717.13bNaNwNaNNaNffNaNda
99e6.45xtnfadw7.1312.77NaNNaNeNaNNaNffNaNda
idclasscap-diametercap-shapecap-surfacecap-colordoes-bruise-or-bleedgill-attachmentgill-spacinggill-colorstem-heightstem-widthstem-rootstem-surfacestem-colorveil-typeveil-colorhas-ringring-typespore-print-colorhabitatseason
31169353116935p14.58xdnfpNaNp14.7835.76sywNaNNaNffNaNda
31169363116936p1.70xknfNaNNaNn4.771.61NaNNaNnNaNNaNffNaNdw
31169373116937p0.69xgofNaNNaNy3.510.73NaNNaNyNaNNaNffNaNdu
31169383116938p9.08stptdcp8.0714.70NaNNaNpNaNNaNtfNaNda
31169393116939p9.30oNaNeffff3.4225.38NaNgnNaNNaNffNaNdu
31169403116940e9.29fNaNntNaNNaNw12.1418.81bNaNwuwtgNaNdu
31169413116941e10.88sNaNwtdcp6.6526.97NaNNaNwNaNNaNffNaNdu
31169423116942p7.82xeefaNaNw9.5111.06NaNNaNyNaNwtzNaNda
31169433116943e9.45pinteNaNp9.1317.77NaNywNaNNaNtpNaNdu
31169443116944p3.20xsgfdcw2.827.79NaNNaNwNaNNaNffNaNgu